49 research outputs found
Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Brain's Ventral Visual Pathway
Converging evidence suggests that the mammalian ventral visual pathway
encodes increasingly complex stimulus features in downstream areas. Using deep
convolutional neural networks, we can now quantitatively demonstrate that there
is indeed an explicit gradient for feature complexity in the ventral pathway of
the human brain. Our approach also allows stimulus features of increasing
complexity to be mapped across the human brain, providing an automated approach
to probing how representations are mapped across the cortical sheet. Finally,
it is shown that deep convolutional neural networks allow decoding of
representations in the human brain at a previously unattainable degree of
accuracy, providing a more sensitive window into the human brain
Modeling the dynamics of human brain activity with recurrent neural networks
Encoding models are used for predicting brain activity in response to sensory
stimuli with the objective of elucidating how sensory information is
represented in the brain. Encoding models typically comprise a nonlinear
transformation of stimuli to features (feature model) and a linear
transformation of features to responses (response model). While there has been
extensive work on developing better feature models, the work on developing
better response models has been rather limited. Here, we investigate the extent
to which recurrent neural network models can use their internal memories for
nonlinear processing of arbitrary feature sequences to predict feature-evoked
response sequences as measured by functional magnetic resonance imaging. We
show that the proposed recurrent neural network models can significantly
outperform established response models by accurately estimating long-term
dependencies that drive hemodynamic responses. The results open a new window
into modeling the dynamics of brain activity in response to sensory stimuli
Semantic vector space models predict neural responses to complex visual stimuli
Encoding models have as their objective to predict neural responses to
naturalistic stimuli with the aim of elucidating how sensory information is
represented in the brain. This prediction is achieved by representing the
stimulus in terms of a suitable feature space and using this feature space to
linearly predict observed neural responses. Here, we investigate to what extent
semantic vector space models can be used to predict neural responses to complex
visual stimuli. We show that these models provide good predictions of neural
responses in downstream visual areas, improving significantly over a low-level
control model based on Gabor wavelet pyramids. The outlined approach provides a
new way to model and map high-level semantic representations across cortex
Brains on Beats
We developed task-optimized deep neural networks (DNNs) that achieved
state-of-the-art performance in different evaluation scenarios for automatic
music tagging. These DNNs were subsequently used to probe the neural
representations of music. Representational similarity analysis revealed the
existence of a representational gradient across the superior temporal gyrus
(STG). Anterior STG was shown to be more sensitive to low-level stimulus
features encoded in shallow DNN layers whereas posterior STG was shown to be
more sensitive to high-level stimulus features encoded in deep DNN layers
k-GANs: Ensemble of Generative Models with Semi-Discrete Optimal Transport
Generative adversarial networks (GANs) are the state of the art in generative
modeling. Unfortunately, most GAN methods are susceptible to mode collapse,
meaning that they tend to capture only a subset of the modes of the true
distribution. A possible way of dealing with this problem is to use an ensemble
of GANs, where (ideally) each network models a single mode. In this paper, we
introduce a principled method for training an ensemble of GANs using
semi-discrete optimal transport theory. In our approach, each generative
network models the transportation map between a point mass (Dirac measure) and
the restriction of the data distribution on a tile of a Voronoi tessellation
that is defined by the location of the point masses. We iteratively train the
generative networks and the point masses until convergence. The resulting
k-GANs algorithm has strong theoretical connection with the k-medoids
algorithm. In our experiments, we show that our ensemble method consistently
outperforms baseline GANs
Deep adversarial neural decoding
Here, we present a novel approach to solve the problem of reconstructing
perceived stimuli from brain responses by combining probabilistic inference
with deep learning. Our approach first inverts the linear transformation from
latent features to brain responses with maximum a posteriori estimation and
then inverts the nonlinear transformation from perceived stimuli to latent
features with adversarial training of convolutional neural networks. We test
our approach with a functional magnetic resonance imaging experiment and show
that it can generate state-of-the-art reconstructions of perceived faces from
brain activations.Comment: Added appendix and updated figure
Temporal Factorization of 3D Convolutional Kernels
3D convolutional neural networks are difficult to train because they are
parameter-expensive and data-hungry. To solve these problems we propose a
simple technique for learning 3D convolutional kernels efficiently requiring
less training data. We achieve this by factorizing the 3D kernel along the
temporal dimension, reducing the number of parameters and making training from
data more efficient. Additionally we introduce a novel dataset called
Video-MNIST to demonstrate the performance of our method. Our method
significantly outperforms the conventional 3D convolution in the low data
regime (1 to 5 videos per class). Finally, our model achieves competitive
results in the high data regime (>10 videos per class) using up to 45% fewer
parameters.Comment: 8 pages, 3 figures, Proceedings of BNAIC/BENELEARN 2019 conferenc
End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent and adversarial networks
Recent years have seen a sharp increase in the number of related yet distinct
advances in semantic segmentation. Here, we tackle this problem by leveraging
the respective strengths of these advances. That is, we formulate a conditional
random field over a four-connected graph as end-to-end trainable convolutional
and recurrent networks, and estimate them via an adversarial process.
Importantly, our model learns not only unary potentials but also pairwise
potentials, while aggregating multi-scale contexts and controlling higher-order
inconsistencies. We evaluate our model on two standard benchmark datasets for
semantic face segmentation, achieving state-of-the-art results on both of them
Background Hardly Matters: Understanding Personality Attribution in Deep Residual Networks
Perceived personality traits attributed to an individual do not have to
correspond to their actual personality traits and may be determined in part by
the context in which one encounters a person. These apparent traits determine,
to a large extent, how other people will behave towards them. Deep neural
networks are increasingly being used to perform automated personality
attribution (e.g., job interviews). It is important that we understand the
driving factors behind the predictions, in humans and in deep neural networks.
This paper explicitly studies the effect of the image background on apparent
personality prediction while addressing two important confounds present in
existing literature; overlapping data splits and including facial information
in the background. Surprisingly, we found no evidence that background
information improves model predictions for apparent personality traits. In
fact, when background is explicitly added to the input, a decrease in
performance was measured across all models.Comment: 10 pages, 4 figures, 2 table
Explainable 3D Convolutional Neural Networks by Learning Temporal Transformations
In this paper we introduce the temporally factorized 3D convolution (3TConv)
as an interpretable alternative to the regular 3D convolution (3DConv). In a
3TConv the 3D convolutional filter is obtained by learning a 2D filter and a
set of temporal transformation parameters, resulting in a sparse filter where
the 2D slices are sequentially dependent on each other in the temporal
dimension. We demonstrate that 3TConv learns temporal transformations that
afford a direct interpretation. The temporal parameters can be used in
combination with various existing 2D visualization methods. We also show that
insight about what the model learns can be achieved by analyzing the
transformation parameter statistics on a layer and model level. Finally, we
implicitly demonstrate that, in popular ConvNets, the 2DConv can be replaced
with a 3TConv and that the weights can be transferred to yield pretrained
3TConvs. pretrained 3TConvnets leverage more than a decade of work on
traditional 2DConvNets by being able to make use of features that have been
proven to deliver excellent results on image classification benchmarks.Comment: 10 pages, 5 figures, 4 table